MON-4477: chore: add permissions on endpointslice to Prometheus Role and use serviceDiscoveryRole: EndpointSlice in ServiceMonitors#1305
Conversation
…rviceDiscoveryRole: EndpointSlice in ServiceMonitors
WalkthroughUpdates Prometheus RBAC configuration and service discovery for the cluster-version-operator to support EndpointSlice resources. Adds RBAC permissions for endpointslices in the discovery.k8s.io API group and configures the ServiceMonitor to use EndpointSlice for service discovery. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Comment |
|
/retitle MON-4477: chore: add permissions on endpointslice to Prometheus Role and use serviceDiscoveryRole: EndpointSlice in ServiceMonitors |
|
@machine424: This pull request references MON-4477 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@machine424: This pull request references MON-4477 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
1 similar comment
|
/retest-required |
|
/verified by existing tests |
|
@machine424: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@machine424: This pull request references MON-4477 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
|
/lgtm |
|
/retest |
wking
left a comment
There was a problem hiding this comment.
Poking around to understand the context here, prometheus-operator/prometheus-operator#6672 -> prometheus-operator/prometheus-operator#3862 -> prometheus/prometheus#6838 -> Kube docs explains the benefits of EndpointSlices for Services backed by many endpoints. That's not this CVO Service though, we just have the one backing endpoint, e.g. in this 4.22.0-rc.1 CI run:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-1of2/2047770526054617088/artifacts/e2e-aws-ovn-serial/gather-extra/artifacts/endpoints.json | jq -r '.items[] | select(.metadata.namespace == "openshift-cluster-version") | {name: .metadata.name, subsets}'
{
"name": "cluster-version-operator",
"subsets": [
{
"addresses": [
{
"ip": "10.0.95.185",
"nodeName": "ip-10-0-95-185.ec2.internal",
"targetRef": {
"kind": "Pod",
"name": "cluster-version-operator-d747d47c9-zz95q",
"namespace": "openshift-cluster-version",
"uid": "0e69c72e-2c44-45bc-b46a-a8008d46271d"
}
}
],
"ports": [
{
"name": "metrics",
"port": 9099,
"protocol": "TCP"
}
]
}
]
}
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-1of2/2047770526054617088/artifacts/e2e-aws-ovn-serial/gather-extra/artifacts/endpointslices.json | jq -r '.items[] | select(.metadata.namespace == "openshift-cluster-version") | {name: .metadata.name, endpoints}'
{
"name": "cluster-version-operator-l52jj",
"endpoints": [
{
"addresses": [
"10.0.95.185"
],
"conditions": {
"ready": true,
"serving": true,
"terminating": false
},
"nodeName": "ip-10-0-95-185.ec2.internal",
"targetRef": {
"kind": "Pod",
"name": "cluster-version-operator-d747d47c9-zz95q",
"namespace": "openshift-cluster-version",
"uid": "0e69c72e-2c44-45bc-b46a-a8008d46271d"
},
"zone": "us-east-1c"
}
]
}However, MON-4477 points out:
Endpoints API is deprecated https://kubernetes.io/blog/2025/04/24/endpoints-deprecation/
And that's a great reason to move off the deprecated-in-Kubernetes-1.33 API, which this pull delivers. OCP 4.22 is based on Kubernetes 1.35, so I'm unclear on why we haven't been getting APIRemovedInNextReleaseInUse alerting since OCP 4.20. Possibly that's because there is no clear plan to remove Endpoints. I opened PI-1510 back in 2022 asking after alert coverage for deprecated APIs, but that ticket's been pretty quiet.
But context aside, looks good to me, thanks! Coverage is well-excercised in pre-merge CI (e.g. TargetDown and ClusterVersionOperatorDown alerting would fail us if we broke CVO monitoring), so no risk of destabilizing other CI or creating QE load:
/lgtm
/label acknowledge-critical-fixes-only
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: machine424, simonpasquier, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Correct, the Endpoints API is so ingrained in Kubernetes that it's likely never going to be removed. |
|
@machine424: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR migrates Prometheus service discovery from the deprecated Endpoints API to the EndpointSlices API, by:
serviceDiscoveryRole: EndpointSliceon ServiceMonitors.endpointslicespermissions.We're taking a conservative approach by keeping the existing
endpointspermissions alongside the newendpointslicesones. This provides a safety net in case any ServiceMonitors, whether deployed from this repo or from another source, still rely on the same Role and were missed during the migration.That said, since both resources provide essentially the same data, keeping both isn't meaningfully more permissive from a security standpoint.
These changes target OpenShift 4.22+ and should not be backported to earlier releases.